PM566 Midterm

Changqing Su

2020/10/9

Intoduction

At the end of 2019, a novel coronavirus was identified as the cause of a cluster of pneumonia cases in China. It rapidly spread, resulting in an epidemic throughout the world. In February 2020, the World Health Organization designated the disease COVID-19, which stands for coronavirus disease 2019. This report is designed to explore the association between COVID-19 death and state, sex, age groups.

Methods

The data was obstained from the US CDC, Centers of Disease Control and Prevention. The date include deaths involving coronavirus disease 2019 (COVID-19), pneumonia, and influenza reported to NCHS by sex and age group and state.

We calculate the proportionate mortality ratio due to COVID-19 by using the COVID-19 deaths/total deaths.

library(data.table)
library(dtplyr)
library(dplyr)
library(leaflet)
library(tidyverse)
library(ggplot2)
#Read file
covid = fread("data/Provisional_COVID-19_Death_Counts_by_Sex__Age__and_State.csv")

#Calculate proportionate mortality ratio due to COVID-19
covid$PMR=covid$`COVID-19 Deaths`/covid$`Total Deaths`

Compare the proportionate mortality ratio among states.

Here, we create a map of proportionate mortality ratio among the states.

#load map data
library(tigris)
states <- states(cb=T)
#construct simple table of Covid data related to States
covid_state=covid[which(covid$Sex == "All Sexes" & covid$State != "United States" & covid$`Age group`=="All Ages"),]
covid_simp=subset(covid_state, select = c("State","COVID-19 Deaths","Total Deaths","PMR"))
covid_simp= covid_simp[order(-covid_simp$`Total Deaths`),]
knitr::kable(head(covid_simp))
State COVID-19 Deaths Total Deaths PMR
California 15534 202764 0.0766112
Florida 14828 162904 0.0910229
Texas 16560 160010 0.1034935
Pennsylvania 8402 98410 0.0853775
Ohio 4522 86595 0.0522201
Illinois 7930 82908 0.0956482
covid_simp= covid_simp[order(-covid_simp$PMR),]
knitr::kable(head(covid_simp))
State COVID-19 Deaths Total Deaths PMR
New York City 20763 62910 0.3300429
New Jersey 14362 66941 0.2145471
Connecticut 4437 23033 0.1926367
Massachusetts 8094 47456 0.1705580
District of Columbia 760 4896 0.1552288
New York 11665 81391 0.1433205

According to the table, we can see that California has the most people died from COVID-19, but it has a low proportionate mortality ratio. This means that California has a large number due to the large population. New York City has the highest proportionate mortality ratio due to COVID. This may caused by small living space per capita, since New York City has a large population but small living space.

#covid_state$State[which(covid_state$State %in% states$NAME ==F)]

#Combine New York City with New York State.
covid_state[33,7:12]=covid_state[33,7:12]+covid_state[34,7:12]
covid_state=covid_state[-34, ]
covid_state$PMR=covid_state$`COVID-19 Deaths`/covid_state$`Total Deaths`
colnames(covid_state)[4]="NAME"

mergedata=merge(x = states, y = covid_state, by = "NAME", all.x = TRUE)

#Construct PMR map
pal <- colorBin("YlOrRd", domain = mergedata$PMR)
leaflet() %>%
 addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(
    data=mergedata,
    fillColor = ~pal(mergedata$PMR),
    fillOpacity = 0.7, 
              weight = 0.2, 
              smoothFactor = 0.2 
  )%>%
  addLegend(pal = pal, 
            values = mergedata$PMR, 
            position = "bottomright", 
            title = "PMR")

Here we can see that the New York State has the highest proportionate mortality ratio due to COVID-19. It means that New York City has the most people died because of COVID-19.

#Construct Total Deaths map
pal2 <- colorBin("YlOrRd", domain = mergedata$`COVID-19 Deaths`)
leaflet() %>%
 addProviderTiles("CartoDB.Positron") %>%
  setView(-98.483330, 38.712046, zoom = 4) %>% 
  addPolygons(
    data=mergedata,
    fillColor = ~pal2(mergedata$`COVID-19 Deaths`),
    fillOpacity = 0.7, 
              weight = 0.2, 
              smoothFactor = 0.2 
  )%>%
  addLegend(pal = pal2, 
            values = mergedata$`COVID-19 Deaths`, 
            position = "bottomright", 
            title = "Total Deaths")

According to this map, we can see that the New York State has the most COVID-19 deaths(more than 30,000). California has the second most COVID-19 deaths(more than 15,000). But comparing to the previous map, it does not have a high RMR. This means that California has a large number because of the large population.

Compare the proportionate mortality ratio beween genders and age groups.

covid_2=covid[which(covid$Sex == "All Sexes" & covid$State == "United States" & covid$`Age group`!="All Ages"),]
covid_2=covid_2[which(covid_2$`Age group` == "Under 1 year" | covid_2$`Age group` == "1-4 years" | covid_2$`Age group` == "5-14 years" | covid_2$`Age group` == "15-24 years" | covid_2$`Age group` == "25-34 years" | covid_2$`Age group` == "35-44 years" | covid_2$`Age group` == "45-54 years" | covid_2$`Age group` == "55-64 years" | covid_2$`Age group` == "65-74 years" | covid_2$`Age group` == "75-84 years" | covid_2$`Age group` == "85 years and over")]

covid_2=na.omit(covid_2)

covid_3=subset(covid_2, select = c("Age group","COVID-19 Deaths","Total Deaths","PMR"))
                
knitr::kable(covid_3) 
Age group COVID-19 Deaths Total Deaths PMR
Under 1 year 22 12092 0.0018194
1-4 years 15 2307 0.0065020
5-14 years 35 3597 0.0097303
15-24 years 369 23153 0.0159375
25-34 years 1541 47522 0.0324271
35-44 years 4039 67177 0.0601247
45-54 years 10627 123171 0.0862784
55-64 years 25421 281571 0.0902827
65-74 years 42950 426332 0.1007431
75-84 years 52618 519488 0.1012882
85 years and over 61172 643778 0.0950203

Here, we can see that older people are more likely to die from COVID-19.

age=c(0,1,5,15,25,35,45,55,65,75,85)
PMR=covid_2$PMR
ggplot(data.frame(age,PMR), aes(x=age,y=PMR,group=""))+geom_point()+geom_line()

The scatter plot also shows that the proportionate mortality ratio increases as age increases. The slope goes downhill because people will eventually die from the old age.

covid_2=covid[which(covid$Sex != "All Sexes" & covid$State == "United States" & covid$`Age group`=="All Ages"),]


covid_2=na.omit(covid_2)

covid_3=subset(covid_2, select = c("Sex","COVID-19 Deaths","Total Deaths","PMR"))
                
knitr::kable(covid_3) 
Sex COVID-19 Deaths Total Deaths PMR
Male 107472 1122212 0.0957680
Female 91332 1027893 0.0888536
Unknown 5 83 0.0602410

Here, we can see that more males died from Covid-19 than females.

Conlucsion

From the map we can conclude that the New York State has the highest proportionate mortality ratio due to COVID-19. Generally more people die from COVID-19 in New York State. Slightly more males died from COVID-19 than females. We found that older people (greater than 65 years old) are easier die because of COVID-19.